Multimodal Speaker Recognition in a Conversation Scenario

نویسندگان

  • Maria Letizia Marchegiani
  • Fiora Pirri
  • Matia Pizzoli
چکیده

As a step toward the design of a robot that can take part to a conversation we propose a robotic system that, taking advantage of multiple perceptual capabilities, actively follows a conversation among several human subjects. The essential idea of our proposal is that the robot system can dynamically change the focus of its attention according to visual or audio stimuli to track the actual speaker throughout the conversation and infer her identity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Triggering Memories of Conversations using Multimodal Classifiers

Our personal conversation memory agent is a wearable ‘experience collection’ system, which unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears the same voice it uses a summary of the last conversation with this person to remind the wearer. To correctly identify a person and h...

متن کامل

MEMN: Multimodal Emotional Memory Network for Emotion Recognition in Dyadic Conversational Videos

Multimodal emotion recognition is a developing field of research which aims at detecting emotions in videos. For conversational videos, current methods mostly ignore the role of inter-speaker dependency relations while classifying emotions. In this paper, we address recognizing utterance-level emotions in dyadic conversations. We propose a deep neural framework, termed Multimodal Emotional Memo...

متن کامل

Achieving Multimodal Cohesion during Intercultural Conversations

How do English as a lingua franca (ELF) speakers achieve multimodal cohesion on the basis of their specific interests and cultural backgrounds? From a dialogic and collaborative view of communication, this study focuses on how verbal and nonverbal modes cohere together during intercultural conversations. The data include approximately 160-minute transcribed video recordings of ELF interactions ...

متن کامل

Meta-Classification of Multimedia Classifiers

Combining multiple classifiers is of particular interest in the multimedia systems, since there is usually data of very different types/modalities that should be mined or analyzed. Our wearable ‘experience collection’ system unobtrusively records the wearer’s conversation, recognizes the face of the dialog partner and remembers his/her voice. When the system sees the same person’s face or hears...

متن کامل

BioSec Multimodal Biometric Database in Text-Dependent Speaker Recognition

In this paper we briefly describe the BioSec multimodal biometric database and analyze its use in automatic text-dependent speaker recognition research. The paper is structured into four parts: a short introduction to the problem of text-dependent speaker recognition; a brief review of other existing databases, including monomodal text-dependent speaker recognition databases and multimodal biom...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009